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Abstract 

The MLE (Maximum Likelihood Estimate) for a multinomial model is proportional to the data. We 
call such estimate an eigenestimate and the relationship of it to the data as the eigenstructure. When the 
multinomial model is generalized to deal with data arise from incomplete or censored categorical counts, 
we would naturally look for this eigenstructure between MLE and data. The paper finds the algebraic 
representation of the eigenstructure (put as Eqn (2.1), with which the intuition is visualized geometrically 
(Figures 2.2 and 4.3) and elaborated in a theory (Section 4). The eigenestimate constructed from the 
eigenstructure must be a stationary point of the likelihood, a result proved in Theorem 4.42. On the 
bridge between the algebraic definition of Eqn 2.1) and the Proof of Theorem 4.42, we have exploited an 
elementary inequality (Lemma 3.1) that governs the primitive cases, defined the thick objects of fragment and 
slice which can be assembled like mechanical parts (Definition 4.1), proved a few intermediary results that 
help build up the intuition (Section 4), conjectured the universal existence of an eigenestimate (Conjecture 
4.32), established a criterion for boundary regularity (Criterion 4.37), and paved way (the Trivial Slicing 
Algorithm (TSA)) for the derivation of the Weaver algorithms (Section 5) that finds the eigenestimate by 
using it to reconstruct the observed counts through the eigenstructure; the reconstruction is iterative but 
derivative-free and matrix-inversion-free. As new addition to the current body of algorithmic methods, the 
Weaver algorithms craftily tighten threads that are weaved on a rectangular grid (Figure 2.3), and is one 
incarnation of the TSA. Finally, we put our method in the context of some existing methods (Section 6). 
Softwares are attached and demonstrated at http://hku.hk/jdong/eigenstruct2013a.html 

1 Introduction 

1.1. A metaphor. Suppose we have an array of chunks of certain material concealed in separate but identical 
boxes so that we cannot see their sizes nor weigh them on a scale. The material is a crystal, and, as usual, is 
made of atoms that arc distributed homogeneously (with certain symmetry) inside the material. The boxes are 
labeled 1 to n. Although we don't have direct access to the chunks in the box, we do hear the screams of a tiny 
demon living freely inside the n chunks. The demon moves itself only by hopping from one place to another. 
It can hop from one chunk to another in a manner that is instant, impulsive, and free; it can also hop from 
one place to another inside the same chunk in the same manner that is instant, impulsive, and free. In short, 
to the demon, the n chunks' internal space is a connected piece dotted homogeneously with atoms labeled with 
1 to n. To us, the only two important things to know are that the demon hops constantly and randomly in 
its living room — though sometimes a portion of the living room may be temporarily unavailable to the demon 
(due to our visitors' meddling with the boxes, for example) — and that, every time it collides itself at an atom 
of the crystal, the demon screams the label of the chunk out and we hear it — though we may not always hear 
the label clearly. Using the demon's screams, we would like to build a score for every chunk so that, through 
the scores, we gain a good sense of how the mass of each chunk compares to the masses of the others. To do 
this, we build a counting machine to count the demon's screams of each label. When the machine does not hear 
exactly which label the demon screams, it creates a grand-label that groups all labels possible to this scream 
and count it up by 1. When the machine sees that visitors are meddling with the boxes, it excludes those boxes 
being meddled with by creating a grand-label that groups all the unmeddled boxes and count negatively at this 
grand-label by the total number of demon screams during the meddling. 



1.2. Using point sets to represent the chunks. Encapsulating this is the following class of phenomena: A 
generalized type of counts (areas, volumes, lengths, weights, scores, magnitudes, degrees, intensities, etc.) are 
recorded for an array of objects. We represent each of these objects as a point set, member of a set family, so 
that we can talk about the "unionic objects" and the "intersectional objects", and record their counts. The 
strong assumption that starts our modeling is that the points are indistinguishable, each equipped with a unit 
mass, and void of internal structure; and the total mass of the family is finite, so that we can pick the proper 
unit system to make this total equal to 1 and the masses are now probabilities. Each point set characterizes 
itself by its probability, gained from the points it is consisted of. Then it is the probability of each set we initially 
aim to deduce, using only the recorded counts of the labels screamed out by the demon as the only source of 
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information. In the maximum likelihood paradigm this means looking for the probability configuration with 
maximum likelihood. 

1.3. Three types of counts and the question of their unifiability. 

1.3.1. Ionic counts. If the recorded counts are the actual counts of events and let there be no two events 
occur in union, then this is the multinomial model (or Dirichlet model, depending on whether the bases or the 
exponents are fixed). In the multinomial/Dirichlet model the counts can be regarded as the eigen (direct, linear, 
proportional, true) manifestation of the underlying probability masses. This is because there exists a unique 
probability configuration that is both the maximum likelihood configuration and is on the line of the counts 
vector. We call the events in this first case the ionic events — these correspond to the labels exactly heard by us 
from the streaming demon. 

1.3.2. Unionic counts. If the recorded counts include those of some unionic events (ionic events combined 
logically by OR), then we are compelled to answer the following question: 

Can we reconstruct the actual counts in the similar way (the eigen way) from the maximum likelihood 
probability configuration? 

The unionic counts correspond to the labels heard ambiguously. 

1.3.3. Conditional counts. The same question must be answered if we are in a third situation: some of the events 
are excluded during particular counting sessions, causing the probability to redistribute onto the conditional 
sample space. In this third case we call the events conditional events and these correspond to the labels heard 
when some boxes are being meddled with. 

1.4. A very simple example introducing the new technique. For example, we want to find the maximum 
likelihood probability configuration (pi,p2,P3) 1 for the multinomial-like kernel: 

2 2 2 / , \4 
X 1 X 2 X 3 \X\ + X 2 ) ■ 

A simple manipulation will solve (pi,P2,Ps) = (0.4,0.4,0.2). 

The new technique to solve the MLE is eigenreconstruction. Briefly, it means generating the following system 
properly and solving it. The algebraic rules underpinning its generation is given in the next section (Eqn 2.1). 
For the moment, we prepare ourselves by asking if we can get a coinciding {pi,P2,Pz)'- 

x±a = 2 
xia = 2 
< x 3 (a + b) = 2 
(xi +x 2 )b = 4 

Xi + X 2 + X 3 = 1 

For this example, the answer is yes (with a = b = 5). Eigenreconstructibility at least requires all the counts 
lie in the linear space spanned by the coordinates of the maximum likelihood probability vector. In a more 
geometric perspective, when we treat the coefficients as variables too, this is a system of polynomial equations, 
each having a total order equal to 2, and each of these polynomial equations defines a surface on which the 
points are the polynomial's roots. The intersection of these surfaces, called the algebraic variety defined by the 
polynomial system, is the solution set of the probability and the coefficients. 

In general, the answer is probably still yes, and this makes the maximum likelihood probability configuration 
not only the weightiest, but also the intuitively convincing because it has a structure — this is the philosophy 
of this paper. The insight gained from the effort trying to understand eigenreconstructibility algebraically, 
computationally, and geometrically will lend explanation to questions like why the kernels x\x 2 x\{x\ + x 2 )~ i 
and x^x^x^ {x\ + X2) share the same maximum likelihood probability configuration ( y^, j^, |). 

1 In this paper, we reserve the symbol p for the eigenestimate and, later, the intersection of the compatibility axis with the 
simplex, that is, p is the solution of the following system. We use x as the variable, presumably simplicial if not explicitly said, 
that is trying to achieve the eigenestimate p. The reason is, before we conclude anything, we have to always remind ourselves that 
the system need not be consistent and p need not exist. 
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2 Structure and Game 



2.1. p(E) and p(E). Denote the finite family of mutually exclusive sets by £ — {Ei : i = 1, 2, . . . , n} and by 
A = ct£ the collection of all possible unions of sets in £ (with the addition of 0), which is, in this case, the 
er-algcbra generated by £. Thus (\J£,a£) is a (finite) measurable space. 

Denote the generalized counts by p : A — > R, whence the recorded counts by p([J J-), for some subcollcction 
T C £, abbreviated as pi, p l3 ■,, p ijk , like pi = p(Ei), p i2 = p(E x U E 2 ), P123 = p(-E'i U£ 2 U £3), .... 

Denote the probability by p : .4 — > [0, 1] and with similar abbreviations we write p\ = p(Ei), pyi = p(£aUi?2), 
P123 = p(^i U Ei U S3), and so on. 

Call EiS the ionic events, and union of more than one E^s the unionic events. Thus the subscript shared 
by both p and p represents an event. 

Remark 2.1. The connection to the familiar settings of categorical counts is the following. First, a category 
(ionic or unionic) is a container of (generalized) counts. An ionic category is a category having no non-empty 
subcategories other than itself. Here, a category is represented by a (measurable) set characterized by its 
probability. In the above Ei-£-A notation, an ionic category is represented by an Ei, the array of all ionic 
categories are represented by £ , and the collection of all (ionic or unionic) categories is represented by A. An 
element of A is E (subscript dropped) and p{E) is the counts that fall into E. The extraordinary thing to note 
is that when p(E) is negative, we say that E is counted negatively. 

2.2. The existences of thick slices and the eigenstructure. p may take an additional superscript e to 
announce its membership in a sub-product e called a slice possessing a property called co-thickness 3 , a real 
number denoted by r e . For instance, p 12 denotes the part of the counts P12 that arc allocated to the 2nd slice 
possessing co-thickness r^ 2 ' . 

With the earlier example, likelihood 3 (x\ + X2) consists of two slices 

^ 1 *^2 

and [x\ + x 2 ) 4 X3. We 

use the superscript notation to write p^ = 2, /4 = 2, p^ = 1, /4 = 1, = 4 and = 5, = 5; 4 
note that = p^ = p 2 2 ^ = p^ 2 as the three events 1, 2, and 12 each has membership in only one slice. An 
illustration is the diagram in Figure 2.1. 




Figure 2.1: Illustrating the notation of p by a two-slice, four-event likelihood 



The algebraic understanding of the relationship between p and p reduces to two rules: 

(2.1) 




for all E € A. The symbol e still denotes a slice and each E pertains to at least an e. The summation in the 
second rule is carried over all slices that E is pertaining to. These rules represent the structural relationship 
between p and p. We name the relationship by calling p an eigenreconstruction of p and p the eigenestimate 
based on p. 

These two seemingly simple algebraic rules surprise us by implying that, when the p's are fixed by observa- 
tions, any satisfactory p : A — > R is a stationary point of the likelihood given by the p's, an analytic property. 
(Theorem 4.42) 

A geometric understanding starts from observing that r e is void of a subscript denoting events, which 
means it is one thickness across all components E in the slice e. An illustration using the same likelihood 
x\x\x\ {x\ + X2) 4 as before is given in Figure 2.2. (A more extensive illustration is Figure 4.3) 



2 In the sense not only that the counts need not be integers, but also that the counts can be negative — in that case it is also a 
generalized measure under the physical image of an electric charge. 

3 Exact meaning of slice and its various properties including co-thickness will be developed in a later section. 
4 The 2 co-thicknesses are equal by coincidence. Computing co-thickness will be explained later. 
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x\x\x 3 [X\ + X 2 ) 4 X 3 




\ _ / 
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x\x\x\{x\ + x 2 ) 4 

Figure 2.2: Geometric understanding of the relationship between p and p 

2.3. Realistic likelihoods and a Game of Optimization. The following examples are demonstrated at 
http : //hku . hk/ j dong/eigenstruct2013a . html. 

Example 2.2. Five players are playing a series Ping-Pong matches. Their first 6 match scores are show in 
Table 2.1. Based on the match scores (p), it is required to determine a final score for each of the five players 
(normalizing the final scores gives the p). 



Table 2.1: n-ary comparison from paired scores from Ping-Pong matches 
Match A-B C-D A-E ITC IXE AT) ZT~ 
Score 21-16 18-21 19-21 25-27 22-20 21-18 

Based on the Bradley-Terry model, with ^Xi — 1, the likelihood is 

„,21„.16 „,18„,21 ~19„,21 ~25„27 

1 2 3 4 1 5 2 3 



/ i \37 / . \39 / . n40 / . n52 

(Xi+X 2 ) (X3+X4) (Xi+X 5 ) (X 2 +X 3 ) 

^,22^20 ~21_18 
4 5 1 4 



1 1 \ 42 1 1 ^39 

Example 2.3. A search engine provider wants to estimate the share of trust of global web users by 5 types 
of information websites: online encyclopedias, online newspapers, web forums, personal blogs, online marketing 
sites. They record data in the following way. Every time a search result page contains some of the 5 types: those 
shown types that are clicked receives a score 5,4,3,2,1, in the time order of clicks; those types that are shown 
but not clicked will score 0; those remaining types that are not shown will be excluded in the particular record. 
Table 2.2 lists the first 4 entries of data (that potentially grow unboundedly) gathered on the web. 



Table 2.2: Web source comparison 
jf Ency. News Forum Blog Mkt. 

1 5 4 - - 0~ 

2 - 5 4 3 - 

3 3 4 5 

4 5 - 



The likelihood can be written as ( with Y^, x i = 1 J 

54 543 345 

X-^X 2 x 2 x^x^ X 2 X^X^y 



(xi + x 2 + x 5 ) 9 (x 2 + x 3 + X4) 12 (x 2 +x i + x 5 ) 9 (xi + x 3 + X4 + x 5 ) 5 

Example 2.4. Frequencies (p) over the cross-classification by a pair of binary variables are sampled from a 
population. But the raw frequency data not only contain frequencies of complete (ionic) classifications, but 
also of incomplete ( unionic ) classifications and conditional classifications. It is required to use all of them to 
determine the relative frequency (normalizing to the p -measure) of each of the 4 categories. An example is given 
as the contingency tables in Table 2-4- 

The likelihood can be written as ( with Xi = 1) 

x^xfx^x 1 ^ ■ (xi + x 2 ) 12 (x 3 + X4) 8 (xi + x 3 ) 2A (x 2 + 1E4) 20 

8 14 4 2 5 2 

■^2^3 X 2 X/^ X\X 3 



(x 2 + x 3 ) 22 (x 2 + x 4 ) 6 (xi + X4) 5 (xi +x 3 + X4) 3 
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Tabic 2.3: Three types of Categories 
Ionic categories with Unionic 
categories on two margins 





A=0 


A=l 


A=x 


B=0 


17 


29 


12 


B=l 


24 


15 


8 


B=x 


24 


20 





Conditional categories 





A=0 


A=l 






A=0 


A=l 


B=0 




8 




B=0 




4 


B=l 


14 






B=l 




2 





A=0 


A=l 


B=0 


5 




B=l 










A=0 


A=l 


B=0 


1 




B=l 


2 






simplifying to 

xj 3 xj 1 X4 7 {xi + x 2 ) 12 (x 3 + Xjf{xi + x 3 ) 24 (x 2 + X4) 14 

(x 2 + x 3 f 2 (xi + x 4 f (xi + x 3 + x 4 f 

For this likelihood, we give our first illustration of how the structural understanding can be applied to find 
the maximum likelihood probabilities. To optimize, we will play the game laid out as in Figure 2.3. The bottom 
and right margins place the counts while the center 0-1 weaving grid topped with a row of straight zeros specifies 
the matrix of the game which alludes to the pattern of cracking of the sample space. The top and left margins 
are the numbers to be filled consistently by the following 3 rules: 

(Rule 1) Every "?" of the top margin when multiplied by the sum of those "?"s of the left margin filtered by a 
"0" in that column headed by the original "?" of the top margin must equal to the corresponding count at 
the bottom margin of that column. 

(Rule 2) Every, except the leading, "?" on the left margin when multiplied by the sum of those "?"s of the top 
margin filtered by a "1" in that row led by the the left "?" must equal to the corresponding count on the 
right margin of that row. 

(Rule 3) The top-margin "?"s must sum to 1. 





? 


? 


? 


? 




? 
















? 


1 


1 








12 


? 








1 


1 


8 


? 


1 





1 





24 


? 





1 





1 


14 


? 





1 


1 





-22 


? 


1 








1 


-5 


? 


1 





1 


1 


-3 




23 


41 


40 


17 





Figure 2.3: Play board of the optimization game 

By the time one consistently fills all the "?"s, the top-margin "?"s will reveal the maximum-likelihood prob- 
abilities. (For your interest the answer to this game is Figure 2.4-) Here is how we solve it. First we initialize 
the top row to a good guess: proportional to the bottom-margin counts while summing to 1. (1) Then we are 
able to determine all the left-margin "?"s except the left-top "?" by Rule 2. (2) Then we determine the left-top 
"?" by the formula of Lemma 5.1 below. 5 (3) Then we determine the top-margin "?"s by Rule 1 (with normal- 
ization to comply with Rule 3). These 3 steps complete an iteration. With just 10 iterations we will solve the 
problem with a good precision: 10 -9 by sum of squared deviation used in Lemma 5.1. We see this will not be 

°In Lemma 5.1's notation, — 2 a(i) ls wna t we will here use to fill the left-top ? , which is a value of the variable denoted To 
there — so it is in fact the number that minimizes a global error measure that is in quadratic relationship with to. More notations: 
x there is the top-margin "?"s here; t(x) there is the left-margin "?"s except the left-top "?" here; A there is the weaving matrix 
(removing the top-row zeros), transposed, here; l nX( j there is an all-one matrix; a (not a(x)) there is the bottom margin here. 
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a computationally expensive procedure at all! In fact, the convergence is linear, i.e., every iteration reduces the 
error by an order, and that's how just 10 iterations can we get from a starting error of 10 1 down to 10~ 9 , and 
if we continue, 10 more iterations will bring precision to 10 -19 . 



3 Inequality 



Lemma 3.1. For positive real numbers x±, . . . , x n > and a\, . . . , a n > we have 

/ n \ £ a < 



II a T 



E a > 

i=l 



E o 4 



(3.1) 



attaining equality iff (if and only if) there exists a common positive ratio k > such that for each i it holds 
2* = k. 

Proof. (Work with 2i and connect to the Weighted AM-GM Inequality, with its equality condition). Rewrite 
the target inequality as 



n n 



irii - 



n ; 



n < 

i=l 



i=l 



i = l 



E 



\i=l 



ft, 



E «. 



i=i 



ES 



E 



^ 1 1 A X 



E 



\6 



n 



i=l E a J 
i=l 



(3.2) 



/=1 



(3.3) 



where we used substitution yt = — in Eqn (3.2) and another substitution Wi — — in Eqn (3.3). The latter 

E a-i 

i = l 

is the Weighted AM-GM Inequality. 

It is crucial that we now check and confirm that all equalities can hold jointly iff — = k for all i, given the 
existence of such a uniform constant k which must be positive. 

For the Weighted AM-GM Inequality, there are two classical proofs: The first one due to G. Polya links it 
with the fundamental inequality e x > 1 + x for any positive number; the second one takes a logarithm of the 
inequality and uses the fact that the logarithm function satisfies Jensen's Inequality. □ 

Remark 3.2. Lemma 3.1 should be understood in two aspects. First is an understanding towards the equality 

n 

attainment condition: Given the power vector a, and the functional form of the left hand side FJ x^ of which 

i=l 



Figure 2.4: Answer to Figure 2.3 
0.2288 0.3126 0.3153 0.1433 



87.41 
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17.44 
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44.11 


1 





1 





24 


30.71 





1 
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14 


-35.04 
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1 





-22 


-13.44 


1 
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-5 


-4.36 


1 





1 
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-3 
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41 


40 


17 
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the base vector x is variable, then if we choose to put x on the same line as a, then we have all the inequalities 
become equalities, to equate the left hand side's product form to the right hand side's summation form, and 
vice versa. Second is to notice that the fraction prefixing the right hand side is a function solely of the a's; 
hence the inequality describes the increase of value when some ions in the products form union, or equivalently, 
the decrease of value when some unions are split into ions. 

Example 3.3. We demonstrate the intuition behind Lemma 3.1's formalism by showing two integer power 
examples. 

a. (x\ + X2) 5 ^ gf^jxfxf. This is because 

xi 1 912 \ 3+2 / _ 1 _ \ 5 



3 a _ xi xi x x x 2 x 2 3 3 3 2 + 2f \ _ 3 22 / +x 2 \ 
12 33322 " "V 3 + 2 / ^ ^ 5 J 

The equality is attained iff (x±,X2) is co-linear with (3,2). 

b. (xi + X2) 7 $5 3 1 |i J (xi + X2 + X3 + X4) 15 . This is because 



(O Xl _|_ r'X2_ 1 T X3+X4 \ 3+5+7 
3+5+7 J 



(xi + x 2 ) 

The equality is attained iff (xi +X2, X3, £4) is co-linear with (7,3,5). More importantly, together with the 
inequality in the previous example, the two equalities are jointly attained iff (xi, X2, X3, X4) is co-linear 
with (21,14,15,25). 

n n 

Corollary 3.4. If we require ^ x% = ^ = 1 in Lemma 3.1 then 

i=l i=l 

n n 

n<* < n< s ( 3 - 4 ) 

i=l i=l 
n n 

^^ailnxi ^ ^^ailnai (3-5) 

i=l i=l 

and the equalities are attained iff x = a. 

Remark 3.5. The logarithmic version Eqn (3.5) is the (finite) entropy inequality. 

Corollary 3.6. Let x G (0, +oo)™ be a vector of length n positive reals. Let 5 6 {0, 1}™ be a vector of n bits. 

n 

Let f3 £ [0, +oo) ri be a non-zero vector of n non-negative reals such that /3j = if Sj = 0. Let b — ^2 Pi > 0. 

i=i 

Let 0° := 1. Then 

h b n 

n /f 

i=i 

attaining equality iff there exists a positive k such that = k for each of those i 's having 8i = 1. 

Example 3.7. Let n = 5, S = (1, 0, 1, 0, 1) T , (3 = (3, 0, 4, 0, 6) T , b = 3 + + 4 + + 6 = 13. ThenMx e {0,+oo) n , 
we have 

I313 

(in + ox 2 + 1x3 + ox 4 + ix 5 ) 13 > 33miQ066 44444 

attaining equality iff X\ : X3 : £5 = 3 : 4 : 6. 



4 Theory 

The theory is for the understanding of the conditions surrounding maximization. It analytically realizes the 
structural relationship between p and p. The theory culminates in a theorem proving that any p satisfying the 
two algebraic rules of Eqn (2.1) is a stationary point of the likelihood. The theory also leaves two conjectures 
for future work. Before we start the development, we lay out the map of the theory as in Figure 4.1. 

4.1. Fragment, Slice, and Superposition. 
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r(i,x ) 



? 
ici 



u 





By) (Vfc) [r(e fc ,y) =r e - (y) l„ xl 



(By) [ T (e,y) = r e (y) l r 



Figure 4.1: Map of Theory 



4- 1.1. Three Objects and some Basic Properties. 

Definition 4.1 (Fragment, Slice, and Superposition). Let n and q be two positive integers. Let x £ (M\{0})™ 
be a vector of length n whose coordinates are non-zero real numbers. Let A = [Si, . . . ,S q ] £ ({0, 1}™ \{0}™) 9 be 
a sans-zero-column matrix of q columns each of which is a vector of n bits, representing an (ion or union) event 
. Let p € (R\{0}) 9 be a vector of length q whose coordinates are non-zero real numbers. Note that pj, the j-th 
component of p, is the count of the event represented by 5j. Denote by l sx t the s x t matrix of Is. 



a. (order- 1 fragment) The power expression 



- /at 



is an order-1 x-fragment. We call the non-zero vector Si the event pattern of t, the positive integer Sjl nx i 
the event size of i, and the exponent pi the event count of l. An order-1 x-fragment is an ionic x-fragment 
iff its event pattern is a standard unit vector. 

b. (union of order-1 fragments) Let ii — (8\x) Pl and 1,2 = (S^x)? 2 be two order-1 re-fragments, ti U 
b2 = (max((5i, S2) 1 x) Pl+P2 is called their union, where the max operation is component-wise. When 
max(<5i,<$2) = lnxi) we cau it an exhaustive union of ordcr-1 fragments. 

c. (order-q fragment) A product of q order-1 fragments 



u,=n(sj X ) 



Pj 



is an order-q x-fragment iff it satisfies a closure condition expressed in the following two equivalent 
versions: 

1. (closure condition version 1) V? 7^ k, Si + 6k € {0, 1}™ 

2. (closure condition version 2) Al„ x i £ {0, 1}™ 

We call the real vector x the vector of ions of lj. We call the number q the order of ui. We call the vector 
p the event counts of to. We call the n x q bit matrix A = [Si, . . . ,S q ] the event pattern of u>. We often 
write lo(x) or w(a;|p, A) to be explicit about its nature as a function. 



d. (slice) An x-fragment 



IM*) 



is an x-slice iff the following exhaust iveness is satisfied, i.e., ^ <5j = l nx i or equivalently Al nx i = l nx i. 

We often write e(x) or e(x\p, A) to be explicit about its nature as a function. For a point p £ (M\{0}) n , 
the slice e is said to achieve co-thickness at p iff V7, 3r e 7^ 0, such that p^- = r e S^p. Such a r e = T e (p) is 
called the co-thickness of e at p. 
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e. (superposition) A product of slices 



7r = JJ e k 

k 



is a superposition iff Bp £ (M\{0}) ™ such that every slice e in the product achieves co-thickness at 
cp, Vc ^= 0. We often write 7r(x) or 7r(x|,o, A) to be explicit about its nature as a function. The line 
represented by cp is called a compatibility axis of tt. Any slice is a product of only one slice, hence may 
be a superposition on its own. 

Remark 4.2. We often deal with the case that a; is a point on the simplex. For that case we may add to the 
name a modifier "simplicial" , whose logical attachment is solely with x, though we allow lexical associativity so 
that the term "(simplicial x)-fragment" means the same as the term "simplicial (x-fragmcnt)" . The restricted 
version of the notion inherits all properties from the unrestricted version, and may have its own properties. It 
is important to understand that the restriction by a simplex is really to intersect the compatibility axis by the 
simplex to result in a single point. Thus, even the intersecting geometric object is not a simplex, the result 
developed here abiding to the whole compatibility axis is still inheritable. There is no need to carry along the 
restriction from the beginning until the solution, but to only apply the restriction after the whole compatibility 
axis is solved. 

Remark 4.3. A nuisance matter here is the inclusion/exclusion of zero at certain coordinates of x and p; we 
choose to exclude them all. We think of a slice as a joint probability on a coarsened version of the original 
finite space of complete categorization. We think of the max operation on a few <5j-s as the union of those sets 
as categories that each bit vector 6j serves as indicator for inclusion. On the relation between an x-fragment 
and an x-slice, we would say "an x-fragment is in an x-slice" and "an x-fragment of an x-slice" if the former 
factorizes the latter. 

Remark 4.4. Note that the order property is not a function. Any order-1 fragment is also an order-g fragment. 
The definitions of order-1 and order-g fragments does not define two disjoint sets; in fact, one of them is a 
subset of the other. For example, the order-1 fragment (5[x) 5 when written as (<5Jx) 2 ((5[x) 2 (<5Jx) is an order-3 
fragment. The order property does become single-valued when we address the collected form of any fragment. 
The collection is over the same event patterns. Therefore we may also conceptualize the notion of "collected 
order" . 

Example 4.5. Let n = 5 and x = (xi, X2, X3, X4, xs) 1 : 

a. u>i = X\ is an order-1 fragment (collected) ; further, it is an ionic fragment. 

• q=l,5i = (1,0,0,0,0) T , pi = 1 

6. ui 2 — x\x\ is an order-2 fragment and it can not be an order-1 fragment. 

• q = 2,6 1 = (1,0,0,0,0)t, Pl = 1, 8 2 = (0,1,0,0,0)1, p 2 = 2 

• closure: 61 + 5 2 = (1,1, 0,0, 0) T 

c. UJ3 = x\x\(x-j, + X4 + X5) 3 is an order-3 fragment; further it is a slice 

• g = 3, ii = (1, 0, 0, 0, 0)t, Pl = 1, 6 2 = (0, 1, 0, 0, 0)t, P2 = 2,S 3 = (0, 0, 1, 1, 1)t, p 3 = 3. 

• closure and exhaustiveness: 61 + 62 + 63 = (1, 1,1,1, 1) T 

d. UJ4 = ( X3+a; ^ +3 ; 5 )2 *s o,n order-1 fragment but it is not ionic because its event pattern is not a standard unit 



vector. 



• q = l,S 1 = (0,0,1,1,1)T, Pl 



= -2 




• q = 3, 5i = (0, 1, 0, 0, 0)t, Pl =2,S 2 = (0, 0, 0, 1, 0)t, p 2 = 5, S 3 

• closure and exhaustiveness: 61 + 6 2 + 63 = (1, 1, 1, 1, 1) T 



(1,0,1,0,1)T, p 3 



= -3. 



1 



is an order-1 fragment and it is uj\ U W4 



. q = l, 6 1 = (1,0,1,1,1)T 



mox[(l,0,0,0,0)T,(0,0,l,l,l)T], Pl 



1 = 1 + (-2) 



The followings are not x-fragments because each of them violates closure: 
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1 
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1 


10 


12 


5 


15 
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r 









x 4 x G x s x 1Q x 12 (x 4-x ) 5 (x -\-x -\-x ) 15 

a. 7Ti = x i x 2 x 3^4 ^5 %y ^ 3 ^4 ,t 5 j — ^ a s \{ ce ^ s a product of 3 slices. It is also a superposition 

yX ±~\~X2~X~X^ ) yX 4 "r^ 5 ) 

with compatibility axis c[2, 3, 4, 5, 6] T . 



• g = 9, A = 



• p = [ 4 6 

• but Algxl = 

• ei = xfx^xlxfxl 2 , e 2 = (xi + x 2 ) 5 (x 3 + x A + x 5 ) 15 , e 3 = (xi + x 2 + x 3 )~ 9 (x 4 + x 5 )~ n . 
b. x\x\{x\ + x 2 ) 

• q = 3, ft = (1,0,0,0,0) T , pi = 1,S 2 = (0,1,0,0,0) T ,, P2 =2, S 3 = (1, 1,0,0, 0) T , p 3 = 1 

• but ft + S 2 + S 3 = (2, 2, 0, 0, 0)t 

4-1-2. Two Partial Orderings. 

Definition 4.6 (Covering). Inheriting notations from Definition 4.1: 

a. (order-1 covering) Let i\ — (S\x) Pl be an order-1 x-fragmcnt. An order-1 x-fragmcnt i 2 = (8\x) P2 is a 
covering fragment of L\ iff 5\ 5 2 (component- wise) . We also say i 2 covers i\. 

b. (order-q covering) Let u be an x-fragment of order q 2. An x-fragment £ of the same order q is a 
covering fragment of uj iff every order- 1 x- fragment of uj has a distinct covering order- 1 fragment among 
£'s q order-1 x-fragments. We also say £ covers u>. 

c. (collectability) Two x- fragments w and £ are collectable fragments to each other iff they share the same 
set of event patterns. We also say that uj collects with £. 

Remark 4.7. The event counts p is irrelevant to the definition of a covering fragment. Two fragments of 
different orders do not cover each other. Distinct correspondence between all order- 1 fragments of uj and all 
order-1 fragments of £ means there exists a bijection between the two sets. We will need this concept when 
extricating from divergence wrecked by the zero coordinates and negative exponentiation. We allow the following 
statement to hold: any fragment covers itself. 

Lemma 4.8. Inheriting notations from Definition 4-1: 

a. "collects with" is an equivalence relation on the set of all x-fragments. 

b. Covering is a partial ordering on the set of all x-fragments with respect to the equivalence relation "is 
collectable with" on the same set. 

c. Two x-fragments are collectible with each other iff they cover each other. 

Proof. The first statement is trivial because the usual equality between two matrices underpins the notion. 
The third statement is implied in the second's anti-symmetry requirement. We show the second statement 
as following. A partial ordering requires reflexivity, anti-symmetry, and transitivity. Let uj\ = (5\x) Pl . uj 2 = 
(<5 2 x)'' 2 , uj 3 = (<5gx) P3 be any three order-1 x-fragments. Reflexivity is trivial: say for u>\, we have S\ ^ 61 
(component-wise). Anti-symmetry refers to (uj\ covers uj 2 and uj 2 covers uj\) =>■ uj\ collects with uj 2 , which is 
true because the condition means uj\ and uj 2 share the same event pattern. Transitivity is trivial: (<$i ^ S 2 and 
5 2 > S 3 ) =► *i > S 3 . □ 

cover cover cover 

Notation 4.9. We shall write uj\ uj 2 or uj 2 ^ uj\ iff uj\ covers uj 2 . We shall write uj\ = uj 2 iff LOi collects 
with uj 2 . 

Example 4.10. 



a. (xi+x 2 ) 3 = (xi+x 2 ) , xf < (xi + x 2 ) 3 , xfx 2 (xi + x 3 ) 

ovt 



X1+X3 
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LUUCI LUUC/ l_UUCI 

b. (x\ +x% +x% + X4 + X5) -7 ^ (xi + ^2 + + £4) 5^ +X2 +£3) ^ (xi + X2) but none of them 
covers x^ because x~[ and x^ do not find distinct covering order-1 fragments in any of them. 

i 

Definition 4.11 (Refinement). Let Q = TT (S]x) Pi be a product of q order-1 x-fragments. Another product 

5=1 

r 

of r ordcr-1 x-fragmcnts S = Yl {lJ x Y l is sa id to refine fi iff Vj G {1, . . . ,q}, 31 j C {1, . . . , r}, such that all 

i=l 

the following four conditions are satisfied: 

a. I 3 ^ 

b. ./, •' j 2 /, n/, > ! 

c- E 7i = <*j 

d. E °"i > Pj 

ieij 

We also say that "S refines ft" and "0 coarsens S". When the last condition attain equalities for all j, then 
we say that "H splits O." 

Remark 4.12. While covering is a partial ordering on the set of all fragments, refinement is a partial ordering 
defined on the set of all products of order-1 fragments, and fragments, slices, and superpositions are all products 
of order-1 fragments. Refinement requires not only refining every event pattern but also dominating the event 
counts. Each of them can be a partial ordering but we do not bother to explicitly define them. We allow the 
following statement to hold: any fragment refines itself. The source of this concept traces from two places: 
First, we need it when we are dealing a product of slices involving negative counts while seeking a condition for 
the attainment of extremity along the compatibility axis; Second, the concept corresponds to features of the 
real sampling process when a category is split into a number of subcategories or its counts is increased. 

Lemma 4.13. Inheriting the notations from Definition J^.ll: 

a. Refinement is a partial ordering on the set of all products of order-1 x-fragments with respect to the usual 
equality between two functions as the equivalence relation. 

b. Two products of order-1 x-fragments refines each other iff they are the same. 

c. Splitting preserves total counts. 

Proof. The second statement is a consequence of the first's anti-symmetry requirement. We show the first 
statement. Reflexivity and Anti-symmetry are trivial because when comparing a fragment to itself we have: 
q = r, Ij = {j}, E 7; = 7j = Sj, and E cr.; = Oj = pj. Transitivity is neither hard to show: we just have 

1 T t „ 

one more nesting in the last 2 requirements. Let ^ = II (S^x) Pj refine ^ = II {lJ x Y % refine W — Y[ {Vi x ) '■ 

3 = 1 i=l 1=1 

For the former pair we continue to use the same "Vj G {1,. . .,<?}, 31 j C {1, . . . r}" notion, and for the latter 
pair we use the analogous "V7 G {1, . . . , r}, 3 Ji C {1, . . . t}" notation so that E Vi — lit hence ^ ^ n = Sj 

leJi ieij leJi 

which means each Sj is the sum of a unique subset of the rjis. This satisfies the third condition. For the last 
condition, from S refining we have E ®l ^ °i an d from Q. refining H we have °! ^ Pj\ eliminating pj we 

have E &l ^ Pj- S° the last condition hold and the proof is complete. 

ieij leJi 

For Nr. 3: Suppose 3 splits fl, then the total counts of f2 is E Pj = E E <J i which is just the total counts 

j=i j=neij 

of E. □ 

refine refine 

Notation 4.14. We shall write Sli ^ O2 or VI2 ^ fii iff Ox refines O2. We simply write fii = fi 2 iff ^1 and 
O2 refines each other. 

Example 4.15. Referring to the earlier example of fragments: 

refine 

a. Si = xfx^ixs + X4) x^ 00 ^ Oi = xix^ixs + X4 + x$) 3 
. q = 3, r = 4, {h, J 2) 7 3 } = {{1}, {2}, {3,4}}, 
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. £7< = (1,0,0,0,0)t = * 1 , £>i = 2>l = pi, 
. E7i=(0,l,0,0,0)T = * a , X)^i = 4>2 = p2, 

• E 7, = 73 + 74 = (0, 0, 1, 1, 1)t, £ °i = ^3 + <T4 = -7 + 100 > 3 = p 3 

3 4 5 refine 3 4 5/ , \2 refine , ,4 

, a^ajjajoaiia^ C x^x 2 x 3 (ii+x 2 ) i (xi+:E2+:C3) 

0. ^ , _ ^ 



X1X3X2X3 ^ (X1+X3) 4 (x 2 +x 3 ) 5 ^ (xi+iE3) 4 :E2(:E2+ii!3) 11 
/me 3 

> (xi + £2 + £3) 



re/me 

c- £5 = (xi + X3)(a;2 + X4) 100 X5 does noi refine U5 — , because although (x\ + £3)2:5 ^ (xi + 

refine 

x 3 + x 5 )- 3 , (x 2 + x 4 ) 100 < . 

4.2. More on Thickness. We have already encountered the notion of compatibility in the definition of su- 
perposition. We have also illustrated both the algebraic aspect and the geometric aspect of co-thickness. In 
a nutshell, compatibility is existence of a co-thickness; in a nutshell, co-thickness is uniform thickness of data 
across all events in the slice. 

Notation 4.16. Denote by t(CI,xq) the Xo-thickness of f2, a product of some order-1 fragments. r(il,xo) is a 
vector, except when fi is an order-1 fragment, in which case r is a scalar. 

Definition 4.17. Let fl = Y[ (S]x) Pj be a product of q ordcr-1 x-fragmcnt. The cco-thickncss of ft is defined 

i=i 

as 

in \ \ P l Pi 1 

t(S2,.t ) - 

a column vector of q scalar components. When q = 1, t(0,,xq) = ~§r^ is scalar-valued. 
Lemma 4.18 (Inverse proportionality of thickness). t(Q,cxo) = -r(f2, Xo), Vc 7^ 0. 

Definition 4.19 (Compatibility between order-1 fragments). Two order-1 fragments L\ and 12 are y-compatible 
iff there exists a point y G (K\{0})" such that t(l%, y) = t(l2, y)- 

Note 4.20. Due to (inverse) proportionality, we see that if L\ and 12 are ^-compatible, then they are cy- 
compatible, Vc 7^ 0. 

1 

Definition 4.21 (Co-thickness of a slice and Superposability of slices). Let e = Yi {5]x) Pj be an x-slice. If 

there exists a point y G (R\{0})" such that all order-1 fragments in e has the same y-thickness, then this 
y-thickness is a co-thickness of e, denoted by r e (y). That is, = ■ ■ ■ = = r e {y). The slice e is said to 

achieve co-thickness at y. Two slices are y-superpo sable iff there exists a point y G (M\{0})™ at which both 
slices achieve co-thickness. 

y 

Notation 4.22. Denote by e\ =c= e^ that the slices e\ and e2 are y-superposable. 

v 

Lemma 4.23. o is an equivalence relation over the set of all x-slices, for some given y G {IR\{0}}". 

Remark 4.24. A co-thickness is a scalar-valued property for a slice. The prefix 'co-' stands for 'compatible'. In 
prefixing 'thickness', it basically equals 'iso-'. Regarding the correspondence between co-thickness and maxi- 
mization, it seems that nature has preferences for the equitable, homogeneous allocation of count thickness over 
the events. 

Lemma 4.25. A product of order-1 fragments is a superposition only if, after collecting every class of collectible 
order-1 fragments, the matrix taking all the S's as the columns has its every row summing to at least 1. 

q 

Proof. Let J\ {^] X ) P3 be the product after collecting every class of collectable order-1 x-fragments so that 

3=1 

after collecting, all 5j's are distinct. Let A = [Si, ■ ■ ■ ,S q ] be the column-binding of the individual Sj's. Denote 
by the i-th row of A. Then any of 5^ not summing to at least 1 means that Sr^ = which means the 
exhaustiveness condition of a slice is violated and the product can never be factorized into a product of any 
slices, not to mention any group of compatible slices. □ 



Sjx ' ' ' ' S^Xq 
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Definition 4.26 (Total y-thickness of a superposition). For a superposition 7r with a compatibility axis passing 

9 

through a point y, we define its 'total y-thickness' as f (tt, y) := r efc (y) where {e^jf is a set of y-supcrposablc 

fe=l 

slices whose product equals ir. 

Lemma 4.27. Inheriting notations from Definition ^.26: The co-thickness superposed overhead each ion is 
equal to the total thickness of the superposition, that is, A t(k, y) = f (n, y) l nx i- 

Example 4.28. Consider the superposition 

xix^xlxfxfixx + x 2 ) 5 (x 3 + x A + x 5 ) 15 



7Tl 



(x-y + x 2 + x 3 ) 9 (x4 + x 5 ) n 

in an earlier example where we have declared it should not be a fragment of any x-slice, but a product of three 



-slices 



ei 

<":>, 



4 6 8 10 12 
A 1 X 2' L 3 X 4 x 5 



(x t + x 2 ) 5 (x 3 + Xi + a- 5 ) 15 
1 

(Xl +X 2 + X 3 ) 9 (X4 + X 5 ) n 



superposable at y = (2,3,4, 5, 6) T (or any scaling of this). The co-thickness achieved at y by slice e\ is r^(y) = 

2. Similary for e 2 and e 3 , T^(y) = 1 and r^(y) = —1. Thus e\ oe 2 oe 3 and their product tt\ is a justified 
superposition (of them). The event pattern matrix of e\ is a 5 x 5 identity matrix; of e 2 and e 3 they are both 
5 x 2 bit matrices. Therefore event pattern of the whole superposition tt\ is a 5 x 9 matrix. The total thickness 
of 7Ti at y is f (it, y) = tW (y) + (y) + (y) = 2. Thus, 
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4.3. Slicing Algorithm and Fundamental Conjecture. 

Definition 4.29 (Slicing Algorithm). Given a superposition ir. An algorithm 21 inputs the initial syntax of 
7r. The algorithm 21 proceeds to find a finite set of mutually compatible slices {ei, . . . ,e g } so that ir is their 
superposition. Such an algorithm 21 is called a slicing algorithm for tt. A slicing algorithm is also said to be 
slicing tt. 

Remark 4.30. A slicing algorithm does not ionize nor unionize any fragments. A slicing algorithm does not 
change the event pattern — it merely redistributes the event counts. 

Example 4.31. Let a,b,c,d> satisfy a + b + c + d= I, then 7r(a, 6, c, d) 
(a-m, b m ,c m , d m ) = (tq, J), o, ttj). We explain the method as following. Write 



(a+b) 4 (c+d) e 



attains maximum at 



10' 20' 3' 12 i 
7T„ 



ar s b os c cs d as ■ (a + bf a+b ^ u {c + d) {c+d)u 



i as b bs c cs d ds 
•(a + b) [a+b)v c cv d dv ■ (c + d) [c+d)w a aw b bw 

and note that we have 6 powers but total 7 variables, therefore one of the auxiliary variables s, u, v, w, must be 
redundant. We can set one of them to zero. Let's set s = and generate the equations to match the exponents. 

a = 2/w, b = 2>/w, c — 4/u, d = 5/v 
(u + v) = -Aw/5, (u + u>) = -2v/3 
— s-u — w ~ 2v/3 — 4w/5 
— > v/3 = w/5 — > v = 'St, w = 5t 



aw = 2 
bw = 3 
cv = 4 
dv = 5 

(a + b) (u + v) -- 
{c + d) (u + w) 
a+b+c+d= 



W V 



t = 4 
v = 12, w = 20 



1 h 3 1 A 

a = — , o = — , c = — , a 
10 20 3 



5 

12 



J->ti= -28) 
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Conjecture 4.32 (The Fundamental Conjecture of Superposition). [FC] We give the following two equivalent 
statements of this conjecture: 

a. Every product of slices is a superposition. 

b. Every product of order-1 fragments is a superposition, with respect to the smallest possible vector of ions 
implied from all of the order-1 fragment. 

Remark 4.33. The conjecture is essentially asserting the existence statement in Definition e of superposition. We 
need the second version because it is the version more flexible and may be useful in a recursive implementation. 
We defer the proof of this conjecture for the future, with the strong faith that it is correct, at least under a few 
reasonable regularity conditions. 

o 

For now, we give a reason why the conjecture is plausible. We define the trivial slicing algorithm 21(ti") with 
the conception that it will transform the claim of the conjecture into a claim of the existence of a solution to a 
polynomial system. That solution set is our goal: the compatibility axis of the superposition. 

Overall, the trivial slicing algorithm (TSA) uses ionic fragments to fill the vacancies left by every 
unionic order-1 fragment to form a complete slice. In addition, there will usually be a slice completely made of 
ionic fragments. 

Let x be the smallest possible vector of ions implied from the product. In the pattern-collected form, the 

M 

product is written as 7r = Yl (^l x ) Pk ■ We then separate the set of ionic fragments from the set of non-ionic 

fc=i 

order-1 a;- fragments and write 

n Q 

II-'-'; ll(^)'- 

i=l i=l 

where can take value to indicate absence of Xi. 

Let p be the point of intersection of the compatibility axis with the simplex having the same number of 
vertices as the length of x. We define the 0-th slice to be made all of ionic fragments, with co-thickness 
T e °(p) =r : 



eo 



n 



(4.1) 



We then define the 1st - Qth slices, with co-thickness r ej (p) = Tj, to be 



31 

II ^ 

£{ke{l,...,n}:8 jk =0} 



Vje{l,...,Q} 



(4.2) 



These Q + 1 slices are automatically supcrposable because they are defined to share the axis along p. 

We need to determine the n components of the intersection p and the Q + 1 coefficients to, t\ , . . . , tq, which 
is to say that we can solve the following system of equations of polynomials for the n + Q + 1 unknowns. 



Tl (SJ P ) 



(11 



Pi 70 + 



E 

je{re{l,...,Q}:5 r i=0} 



(4.3) 



1 = 



Pn TO + E 

\ je{re{l,...,Q}:5 rn =0} 

Pi H h Pn 



In the lower c^-block equations, the complicated summation of tjs can be simplified if we use 6j r ^ to denote the 
r-th row of the n x Q matrix A — [Si, ... , Sq], and use the Qx 1 column vector r = . . . , tq) t : 



Pi \T 



llxQ - I 



a n = Pn [TQ + (llxQ - £( T „)J Tj 



(4.4) 



These are n + Q + 1 polynomial equations written on exactly n + Q + 1 indetcrminates. It remains for the future 
to show that these equations has at least one solution given any parameter triple (a, b, A). 
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Remark 4.34. Observe that there is a dual relationship in Eqn (4.3) (with the simplification of Eqn (4.4)) 
between (p, a) and (t, b) : the coordinates of p when expressed on the basis formed by the components of a is 
completely determined by t (and To); symmetrically, the coordinates of r when expressed on the basis formed 
by the components of the vector b is completely determined by p. We will exploit this duality later when we 
device an iterative algorithm for solving the system. The last sum-to-one equation is a solitude constraint that 
does not interfere with the other equations. Its main use is to degenerate the solution set of p from originally 
an axis of points to a single point as the intersection of that original axis and the sum-to-one affine plane. 
Under it, a one-to-one correspondence exists between the point of intersection and the axis. Now that we 
are dealing with the art of solving polynomial equations, we are entering the realm of algebraic geometry. The 
TSA-yielded polynomial system itself does not imply positivity of x\ and if we allow at the moment that x varies 
unrestrainedly, into any orthant it would like, then we see that the compatibility axis need not even intersect 
with the sum-to-one affine plane — the axis could just be parallel to the plane. This observation describes the 
reality of practice: we are given the system of polynomials; we start without the positivity constraint on x; 
we insisted on not using the positivity constraint until we have found the solution set and then we use it 
only to constrain the solution set. A reminder on the logical flow: we have not yet definitively associated the 
compatibility axis with the locus of extremities, despite the lurking temptation, which we will embrace next 
when a regularity criterion is ready. 

4.4. Simplicial Regularity. 

Definition 4.35. Some definitions related to 'simplex'. 

a. (Simplex). Denote by 

T„_i := {{x 1} ...,x n ) G [0, 1]™ : l lxn x = 1} (4.5) 

the n — 1 dimensional simplex as the 'first' diagonal hyper-plane of the n dimensional unit cube. Then it 
is clear that its interior is 

K-i = {fa,. ■ ■ , x n ) G (0, 1)" : l lxn x = 1} . (4.6) 
It is also clear that its boundary is 

OT n _i = {(a?i,...,x„) G T„_! : 3i G {1, . . . , n}, Xi = 0} . (4.7) 

b. (Uniformly regular product of order-1 fragments). Let x G T n _i. Let ir be a product of order-1 simplicial 
x- fragments whose union is exhaustive, that is, the product is some x-supcrposition by [FC], We say ir is 
a uniformly regular product of ordcr-1 x-fragmcnts iff it converges everywhere on T n _i; in particular, tt 
converges everywhere on 9T n _i. 

c. (The zero simplicial slice). Let x G T n _i. We make the convenient definition 

n 

]Jx?:=0 (4.8) 

i=i 

and call it the zero simplicial x-slice. 

Remark 4.36. For Nr. 2, the definition of uniform regularity: it is essentially a continuity requirement, i.e., 
we are requiring/assuming the product to be a continuous function. Recall that all continuous function on a 
compact domain is uniformly continuous; hence the inclusion of "uniformly" in the name. 

Criterion 4.37 (Criterion for uniform simplicial regularity for a superposition). Let x G T n _i. Let LI be 
a product of order-1 simplicial x-fragments whose union is exhaustive. Then LI is uniformly regular iff the 
following two conditions are jointly satisfied 

a. for every negatively powered order-1 fragment i~ of LI, the powers of all order-1 fragments in LI covered 
by i~ (including i~ 's own power) sum to a non-negative real number. 

b. for every non- exhaustive union U~ of some of these l~ s such that U~ does not factorize LI, the powers 
of all order-1 fragments of LI covered by U~ (including the powers of all those IT s that unionize to U~ as 
they are covered by U~ , but not including the power ofU~ as it is not an order-1 fragment ofH) sum to 
a non-negative real number. 
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Proof, (only if) We will show that if either condition fails, then II diverges somewhere on the boundary. Suppose 

m 

II = n't" Y\ m for which l~ = (6qx) c ° with c < and {m = (Sjx) Cl , VZ € {1, . . ., m} : Si ^ Jo} is the set 

l=i 

of all other order-1 ^-fragments of II that are covered by i~ . Then Vx £ {x : SqX = 0} C 9T„_i we have 



t 11 K i — !=1 which converges only if cq + ci > 0. 

z=i i=i 

Suppose II = II' J] t~ J| for which U~ = \J L~ = \J (SJx) r = 

r—1 l—l r—1 r—1 

and 



r=l 



Si 6 '' 

with every 6 r < 



k ; = (6jx)i ,Vl e {!,... 7 m} :5i^y^ S r 



( r=l ^ 

is the set of all other ordcr-1 ^fragments of II that are covered by U~ . Then \/x G /a; : ^ ^ ^rj £ = 1 C 

s m b,- + H c; s m 

9T n _i we have TJ L r Yl K i = r=1 i=1 which converges only if ^ 6 r + c\ 0. 

r=l ;=i r=l (=1 

(if) We will show that II diverges only if either condition fails. It is clear that the product diverges only 
if 9Jt or €Tt, where OT= "the denominator is and the numerator is not 0" and s Xt= "the numerator is a lower 
order of zero than is the denominator." 

9Tt is true only if 9JT*= "some of IPs negatively powered order-1 fragments are zero and none of its positively 
powered order-1 fragments is zero" is true. 9Jt* is true only if the first condition is false: if any negatively powered 
iT of II covers any positively powered order-1 fragments then i~ = ==>■ numerator contains as factor =>■ 
contradiction. 

R R 

m is true iff II = % = with < a < /3 and uv ^ 0. Write B = v{f = v JI 4- = v FJ (8Jx) and 

r—1 r r—1 

R T T T 

(5 = J2 ( — M- Write A = u0 a = u JJ tf = u JJ {SJx) Ct and a = c t . Then 9t is true only if the second 

r=l t=l t=l t=l 

R 



R 



T 



then the set of positively powered order-1 fragment 



condition is false: Let U = (J i r = I E S r I 

r=l 

covered by U~m H = — is H = -|?; + = (5 J x) c : (^52 ^r^j £ = S J x = ol which is a subset of the t^~s above. 

Then the sum of powers of order-1 x-fragments covered by U~ in II is less than a — (3 < 0, which means that 
Vt is true only if the second condition is false. □ 

Remark 4.38. The two conditions can be merged in to one: for every non-exhaustive union U~ of some of 
the negatively powered order-1 fragments of II , the powers of all order-1 fragments of II covered 
by U~ sum to a non-negative real number. 

The criterion is not lean — an exponentially expensive enumeration underlies it. An algorithm that closely 
implements it will probably be non-polynomial time. In our own prototyping implementation, running time in 
seconds is approximately T = ^u.imns ~ N2 where N is the number of l s in II and ~ 13 is the maximal 
N for the program to finish below 10s. Since validation is so expensive, we will have to run the optimization 
without first validating that it is optimizable. The silver lining is that all superpositions resulting from sampling 
the ionized, coarsened, and conditional spaces are uniformly regular. 

Note 4.39. Potential algorithmic optimization can be done if one considers the obvious (directed acyclic) graph 
underlying every superposition: Each node corresponds to a Sk and we add a weight to each node to represent 
the corresponding count bk] Edges are arrows emitting from a Sk pointing to every S m 5k (element-wise) so 
that there is no Si such that S m ^ Si Sk- Then it is clear that those ionic nodes do not have any arrows 
entering them while those largest S^s do not emit any arrows. Such a graph represents the fragment covering 
structure of the superposition and we may call it the graph of covering. 

Example 4.40. We give the following examples to explain the proof above. 
• There are 3 non- exhaustive U~ : 

I i i \-200 

O [Xi +X-2+ X 3 ) 



* Covers (xi + x 2 + x 3 ) 200 , (xi + x 2 ) Y , x\ 00 , x 



2 ! *3 



16 



total counts are —200 

220 



1 + 100 + 100 + 100 > 



o {x 2 + X 3 + x i/ 

* Covers (x 2 + x 3 + xa) 



-220 



100 „,100 



,100 



o (xi 

(X2 



X 2 
X 3 



total counts are —220 

.-420 



(x 3 +x i ) 20 , X 
- 20+ 100+ 100+ 100 ^ 



%3 
Xi) 



- X4j 
-220 



= (xi + x 2 + x 3 ) U 



* Covers (x\ + x 2 + x 3 ) 200 , (x 2 + x 3 + x 4 ) 220 , (X1+X2) 1 , (2:3 + X4) 20 , x{ 00 , x 

* — > total counts are -200 - 220 + 1 + 20 + 100 + 100 + 100 + 100 ^ 

Therefore it is uniformly regular. 



100 „.ioo „.ioo 
2 > x 3 , x 4 



b. 16T4 



„100 100 100„,100 



(X 1 +X2+X3y^(x2+X3+Xj l )' 2 ' 2 " 



• There are 3 non- exhaustive U s like before. 

• After repeating the analysis, we will find that for the last union U~ , (xi + x 2 + x 3 + x^)^ 422 = 
(xi + x 2 + x 3 )~ 202 U (2:2 + x 3 + aZZ order-1 fragments covered by it have a negative total 
counts: -202 - 220 + 1 + 20 + 100 + 100 + 100 + 100 < 0. 



• Therefore it is not uniformly regular. It diverges at the vertex of x§ 

• The graph of covering of this superposition is shown in Figure 4-2. 



1. 





i 00 x 2 00 4 00 4 00 x 5 (a;i+a; 2 )(x3+X4) 

I 1 1 \202 / . . N 220 

(xi+x 2 +x 3 ) {x 2 +x 3 +x 4 ) 



2(1 



17- a o n u c n ■ c x] m 'x 2 m x! t m x" n 'x 5 (x 1 +X2)(x3+x i y 

Figure 4.2: Graph of Covering for ^-^-^j^-^L^ 



T . .r*r4 QQ <>i+*2)(*3+*4) 20 



For this density we have only 2 non- exhaustive U s and both of them are well covered. 
o {x x + x 2 + x 3 ) 



-202 



/ 1 1 N-220 

o (x 2 + x 3 + a- 4 ) 
• Therefore this density is uniformly regular. 

1 



Theorem 4.41. Let x g T„_i. LeZ 7r = Yl (^] x ) Pj be a simplicial x- superposition with only positive counts, 



i.e., Vj, Pj > 0. Let p be the intersection of it 's compatibility axis and T„_i. Asserting p's existence. Then p is 
a maximizer of ir(x). 
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Proof. Let e be any of the mutually superposable slices of tt. Then it is sufficient to prove that p is the unique 
maximizcr of e(x) = n Uj x ) ■ This 

is an immediate corollary from Lemma 3.1: 

3=1 



e(x) = f[(l]^ 

3 = 1 



3=1 
<?l 

"'3 



j=i 



My: 




3=1 




whence equality is attained if x = p (by definition, at p, the ratios of the exponents to bases are equalized). 
Note that "only if x = p" is not guaranteed, even though the maximal value attained at any such p must be the 
same constant given above. A counter example for "only if x = p" is X\(x2 + £3) for which a ridge is present. 
It is an uncomfortable fact that such a p may not be unique although n(jp) is unique. □ 

Theorem 4.42 (p is a stationary point of tt). Let x £ T n _i. Let it be a simplicial x- superposition. Let p be 
the intersection of it's compatibility axis and T„_i. Then p is a stationary point of tt. 

Proof. We start with separately processing 7r's numerator and denominator. First, for each order-1 fragment 
on the numerator, we complete it with a product of ionic fragments so that together their product becomes an 
x-slicc maximized at p. At the same time we multiply this same product of ionic fragments to the denominator 
so that the whole fraction remains unchanged. Then the numerator has become a product of superposable 
cc-slices all maximized at p. For each slice, we apply Lemma 3.1 in the increasing direction: to merge the bases 
from a product to a sum. For a slice, that sum is 1 which means we an replace it with a constant. Thus 
the whole numerator has become a product of constants. Moreover this product of constants maximizes the 
numerator. 

On the denominator, for each order-1 fragment, we apply the decreasing direction of Lemma 3.1: to split an 
order-1 fragment into a product of ionic fragments, according to the proportion of p, times a constant. Thus the 
whole denominator will now become a product of all ionic fragments. Moreover this product of ionic fragments 
is smaller than the original denominator and it is minimized at p. 

The fraction has now become a constant divided by a multinomial kernel multiplied by another constant: If 
we denote the fraction in this form by /, then 

ir(x)^f (x) 

T (P) = / (P) 

because in both processing of the numerator and the denominator, we always increase the whole fraction while 
maintaining equality at p in every step. Moreover, / is stationarily minimal at p because its denominator, a 
multinomial kernel, is stationarily maximal at p. Thus p must be a stationary point of tt, which is dominated 
by / and touches / only at p. □ 

Remark 4.43. Theorem 4.42 should put the algebraic method developed here on equal footing to the calculus 
method of solving the likelihood equations. 

Proposition 4.44 (Main inequality (conjectured)). Let x £ T n _i. Let tt be a uniformly regular simplicial 
x- superposition. Assume [FC] to be true. Let p be the intersection of it's compatibility axis and T n _i. Then 
tt(x) ^ 7r(p), Va;, attaining equality (i.e. maximum) iff x = p. 

Remark 4.45. If true, this proposition will grant many properties including uniqueness of maximum of any 
such likelihood surface. This inequality is key to understand how the algebraic nature of equality attainment 
condition plays the necessary and sufficient role of the optimality condition, given the Fundamental Conjecture 
(i.e., existence of p) is true. In practice, the main inequality is broken down to two bridging inequalities, namely 
one to apply our seminal Lemma 3.1 to shrink the denominator by splitting it and the other to apply our seminal 
Lemma 3.1 again to aggrandize the numerator by merging its parts. Examples follow. 

Example 4.46. We randomly generate a few examples and show how the inequality, with its equality attainment 
condition, solves the optimization problem. 
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a. 



kx\ kx2 ~,kx3 



\k> 



• xi : x 2 : x 3 = 10 : 14 : 20.4 
TTiis example unravels the optimization process of Hi , for x £ T4 . 



IIi 



x\x\x%x\ °xl (xi + ar 2 ) (x 3 +X4 + x 5 ) 
(xi + x 2 + x 3 ) 9 (x 4 + x 5 ) n 



XJX2X3X4 U Xg {X\ + X 2 ) (X3+X4+X5) 



-Oi 

»4„.6„.4„5„,6/ 



\5 4 5 6 
X 2 J ^3X4X5 



CiX 1 X 2 X 3 X 4 X 5 (x3 + X4, + X 5 ) 
Cl [x^xfx^xlxf] X^X2(X3 + X4 + Xs) 15 

20 



< Ci c 2 (xi + X 2 + X 3 + X4 + x 5 ) 
= cic 2 c 3 (xi + x 2 + x 3 + x 4 + x 5 ) 40 

5 5 4 4 5 5g6 22 3 3 4 4 5 5 6 6 2 2 3 3l 5 15 



c 3 (xi + X 2 + X 3 + X4 + £5) 



211 



9 9 11 



11 



20 



2(1 



20 



211 



All equalities can be jointly attained at [2, 3, 4, 5, 6]/20. We see that compatibility and uniform regularity 
together enable the adsorption of all the denominator order- 1 fragments into the numerator while Lemma 
3.1 assures ascent of the whole expression in both the ionization of the denominator and the unionization 
in the numerator. 

Now consider a variation Hi of Hi, still constructed from the same axis span([2, 3, 4, 5, 6]), for x £ T4: 



Ui 



X-^X2^^X 4 Xt^ (X3 ~f~ X4. -\- X5) 



15 



(x 3 + Xi)" 



X-^X2X^X^ X^{X"^ ~\~ Xa. ~\~ "^5) 



15 



J_™4„5 
ci X 3 X 4 



CiXiX^X^X^X^X^ + X4 + X5) 
XiX 2 (x 3 + 



C\ ^X-^ X^Xr^X^Xt^ 



< cic 2 c 3 (xi + x 3 + x 3 + x 4 - 

4 4 5 5 2233 4 4 5 5 6 6 2 2 3 3l 5 15 



15 

x 4 + x 5 ) 

40 



15 



X5) 



9 9 



20^ 



20^ 



attaining all equalities at [2, 3, 4, 5, 6]/20. Note that the ratio of Hi and Hi at every point along the 
compatibility axis is always > 1- 

In this example we give the geometrical representation. Consider the superposition 

x\x\°x%x 2 A A (xi + x 2 ) 6 (xi + x 3 ) 8 (x 2 + x 3 ) 10 (x 3 + x 4 ) 14 
(xi + x 2 + x 3 ) 6 (x 2 + x 3 + x 4 ) 9 

attaining maximum at (pi,P2,P3,Pi) = (0.1,0.2,0.3,0.4). We represent it as assembly of fragments in 
Figure 4-3. There are positive thickness fragments (solid line) and negative thickness fragments (dotted 
line). There are two directions of operations: horizontal (join and split) and vertical (stack and section). 
Horizontal operations will increase or decrease total product. Whenever we join two positive thickness 
fragments whose patterns complements with each other, we increase the total product. Whenever we 
split a negative thickness fragment into a set of smaller fragments with the same negative thickness, we 
increase the total product. In short, in order to increase total product, we join solid-line components, and 
split dotted-line components. Vertical operations do not change total product. We may do whatever times 
of stacking or slicing without changing the total product. In Figure 4-3, numbers inside shapes are labels 
while numbers outside shapes are thicknesses (given we already know the compatibility axis). The mission 
is to use the above- described 4 operations: 

1. joining solid-line fragments 

2. splitting dotted-line fragments 
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Figure 4.3: Fragments, Slices, and Superposition: The Physical Picture 

3. stacking fragments 
4- sectioning fragments 

to achieve complete slice with the positive thickness 70. 
We can solve the puzzle as the following: 

1. Join fragment "12(thick 20)" and fragment "34(thick 20)" to form a complete slice "1234(thick 20)". 

2. Split fragment "123(thick -10)" into "13(thick -10)" and "2(thick -10)" 

3. Stack "13(thick -10)" on "13 (thick 20)" to yield a "13 (thick 10)" 

4. Stack "2(thick -10)" on "2(thtck 50)" to yield "2 (thick 40)" 

5. Split "234(thick -10)" into "23 (thick -10)" and "4 (thick -10)" 

6. Stack u 23(thick -10)" on u 23(thick 20)" to yield "23(thick 10)" 

7. Stack "4 (thick -10)" on "4 (thick 60)" to yield "4 (thick 50)" 

8. Now stack and join all fragments: "2 (thick 50)", "13 (thick 10)", "1 (thick 40)", "3(thick 30)", 
"23(thick 10)", "4(thick 50)", and "1234(thick 20)" to form "(1234thick 70)". 

Note 4.47. The extreme case would be to strip off 30 units of thickness from each of the 4 atomic fragments, 
vanishing X3. 
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Corollary 4.48. If [FC] is true, then two uniformly regular simplicial x-slices are superposable only if they 
share maximizers. 

Proposition 4.49 (The commutative semi-ring of uniformly regular simplicial slices). Let e be a uniformly 
regular simplicial x-slice. Let W be the set of uniformly regular simplicial x-slices superposable with e. If 
both [FC] and Proposition 4-44 are true, then with the usual addition "+" and multiplication "■" between 

n n 

two functions, (W, +, •) is a commutative semi-ring, with Y[ x t° as the additive identity and Y[ x i as the 

i=l i=l 

multiplicative identity. 

Proof. As we are using the usual addition and multiplication between two functions, associativity for the two 
operations and distributive law of multiplication on addition are readily hold. It is also obvious that the 

n n 

additive identity is Yl = an d the multiplicative identity is Yi x i = 1- Closures: Let / be a simplicial 

i=l »=i 
x-slice such that f(x) ^ /(#*)■ Let g be another simplicial x-slice such that g(x) ^ g (x*). Note that both / 
and g are positive functions. Then it is clear (/ + g) (x) = f(x) + g(x) ^ / (x*) + g (x*) = (f + g) (x*) and 
< (/ • g) (x) = f(x) ■ g(x) f (x*) ■ g (x*) = (/ • g) (x*). □ 



5 Algorithms 

One quick application of the algebraic understanding of the relationship between p and p will lead us to devise 
a successful iteration. The intuition is that any point on the simplex should reconstruct its own ionic counts, if 
the unionic counts are fixed. Then by seeing how far these reconstructed ionic counts deviate from the actual 
ionic counts and how these deviations can vary, we get a piece of useful information for deciding how to proceed 
to the true maximum. A more specific description follows. 

We first run the trivial slicing algorithm on ir. By fixing the vector b and the bit matrix A, we can evaluate 
all except one of the auxiliary coefficients for the composite slices by 



or in vectorial form 

f \ b 
T x = 'XT' 

where r is the column vector (n, . . . ,r„) T and the division is understood as element- wise. It turns out that r 
represents the vector of slices' thicknesses. The exception is the coefficient To for the ionic slice, which we will 
keep as an unknown variable to solve. We then reconstruct the ionic counts by 

Ri(x, r ) = Xi | tq + (1 - Ajj ) tj (x) 
i=i 

or in vectorial form 

R(x,T ) = X * (t 1„x1 + (lnxq ~ A) T (x)) 

where the vectorial multiplication * is understood as element-wise, l„ x i the column vector of Is, and l nXg the 
n by q matrix of Is. We then find the individual reconstruction deviations by 



or in vectorial form 

and the sum of squared deviation 



di (x,t ) = Ri (x,t ) - Oj 
d(x,T ) = R(x,To) - a 



e = d(x,T ) T d(x,T ) . 

It is clear that our objective is to make this e zero, by choosing x and To. Before delving into the details of the 
algorithm, we observe that To is in quadratic relation with e, as quickly shown in the following. 

Lemma 5.1 (Quadratic relationship between SSE and the zero-th auxiliary), e (to|x) = a{x)r^ + b(x)ro + c(x) 
where 

a(x) = x T x, 
b(x) = 2x J {diag (x) (l nxq - A)t (x) - a} , 

c(x) = t(x) j (l qxn - A T ) diag (x) 1 diag (x) (l nxq - A)t(x) 
— 2t(x) t (l 9 xn — A T ) diag(x) J a + a T a 
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Proof. It is clear that when x is fixed, R(tq\x) is linear in To with the form Tox + r)(x). Therefore d(ro\x) = tqx + 
r](x) — a. Therefore e (to|x) is quadratic in tq with second-order term TqX j x and first-order term 2tqx t (j](x) — 
a), which expands to 2tqx j {diag(x) (l n xq — A) r (x) — a}, and constant term (rj (x) — a) 1 {q (x) — a), which 
expands to the above-stated form of c(x). □ 

Corollary 5.2 (A necessary condition for solution), e — only if b(x) 2 = 4a (x) c (x). 



Algorithm 1: The Greedy Weaver Algorithm 



Data: The ionic counts vector a, the unionic counts vector b, and the event pattern matrix A. 

Ensure: a is a vector of positive reals; A is a 0-1 matrix, has the same number of rows as the length of a, 

and has the same number of columns as the length of b. 
Result: The eigenestimate p. 
n <— length(a) 
q <— length(o) 
x «— o/sum(a) 

r <— function(a;)6 (A J x) // is element -wise division 

To <— function(x) x i a ~ dla s(- c ^( 1 j'X'i~ A ) T ( :I ')) // according to Lemma 5.1 

ii function(x)2; * {to(x)1„xi + (Inxg — A)r(x)} — a // * is elem-wise prod 

s function(2;)d(x) T d(:r) // sum of squared error 

I function(a;){return the index of the component of d(x) with the largest absolute value}, 
loopcount <— 

while e(x) > 0.0000000000001 do 

i <— I(x) II the index of largest deviation 

/* perturb to learn the relationship between x[i] and d[i] using a interpolating 

parabola and trying to vanish d[i] by setting x[i] to a root of the parabola */ 
/* making 3 x-values to interpolate a parabola: */ 
u i— Array{0,x[i], 1.05x[i]} II [] for array indexing, starts from 1 

temp ■h- x 
tcmp[i] 1.05x[i] 
// update the temp[n] 
temp[n] 

tcmp[n] <— 1 — sum(temp) 

/* evaluate the corresponding y-values: */ 
v 4— Array{— a[i], d(x)[i], d(temp) [i] } // d(x)[i] is ith component of d(x) 

I* find the coefs of the parabola y = ax 2 + /3x + 7 */ 
7 v[l] II array index starts from 1 

M [3](t»[2]-t,[l])-t 1 [2](t,[3]-t.[l]) 
W *~ u[2]2«[3]-u[2]u[3] 2 
a , -u[3] 2 (v[2]-v[l])+u[2] 2 (v[3]-v[l]) 
P * m[2] 2 u[3]— it [2] « [3] 2 

/* pick the suitable zero of the parabola, usu. the larger one: */ 

X\l\ < ^ 

I* properly update x[n] or normalize x, may add code */ 

x[n] <r- 

x[n] ^— 1 — sum(i) 
loopcount loopcount + 1 
end 

p i— x II The result 



We now give our first greedy algorithm (Algorithm 1) named "Greedy Weaver". It is "greedy" because it 
carries out a sequence of conditional optimization on the "worst" x coordinate hoping that this sequence of 
conditional optimization will bring about the joint optimality. The pseudocode in Algorithm 1 is ready for a 
starting implementation. 

Next we give a second algorithm. Recall in Remark 4.34 we have mentioned that there is a dual relationship 
between (o, x) and (b, r) . The foregoing algorithm has not utilized this observation. We specify the following 
Algorithm 2 named "Weaver" to exploit this observation. It removes the apparent "greedy" parts from "Greedy 
Weaver." The Weaver algorithm is the one used to solve the game in Example 2.4. 

The "Greedy Weaver" and "Weaver" algorithms are children of the Reconstruction philosophy and the TSA 
method. Empirically if one puts the two in a try. ..catch... block in the order prioritizing Weaver over Greedy 
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Algorithm 2: The Weaver Algorithm 



Data: The ionic counts vector a, the unionic counts vector 6, and the event pattern matrix A. 

Ensure: a is a vector of positive reals; A is a 0-1 matrix, has the same number of rows as the length of a, 

and has the same number of columns as the length of b. 
Result: The eigenestimate p. 
n <— length(a) 
q <S— length(fe) 



r {a — diagfa;) (l r 



-A)tQe)} 



x <— fl/sum(a) 
t «— function(a;)6 (A T a;) 

To <— function(a:)- 
d <- function^x * {t (x)1„ x i + (l nxq - 
e <— function(a;)d(x) T ci(a;) 
loopcount <— 
smallcstSSE <— e(x) 
bestx x 
SSE «- e{x) 

while SSE > 0.0000000000001 do 

x <- a -f- ((l„ X( j - A)r(x) + T (x)l„xi) 

£ < 

sum(s] 

// some optional bookkeeping: 
if SSE < smallestSSE then 
smallestSSE <- SSE 
bestx <— a; 
end 

loopcount loopcount + 1 
SSE <r- e{x) 
end 

p bestx 



A)r(a;)} - a 



II -r- is element -wise division 

// according to Lemma 5.1 
// * is elem-wise prod 
// sum of squared error 



// -r- is elem-wise division 



// The result 



Weaver and throw an exception carrying the bestx value whenever Weaver goes astray and use the bestx to 
initialize the Greedy Weaver, the pair work very well for a large proportion of inputs. 
try{ 

Weaver with exception handling code 
} catch exception! 

Greedy Weaver with x initialized to exception. bestx 

} 

We haven't mathematically studied their convergence properties or investigated what it means when the alliance 
fails on some input. But potentially the heavy reliance on the availability of ionic thickness (as TSA always 
uses ionic fragments to fill the vacancies) can cause divergence for some regular inputs. 

Two actual implementations in MATLAB and EXCEL VBA arc available for download at the website 
accompanying this paper. The first is a minimal Excel VBA implementation specializes in portability and easy 
usage. The MATLAB implementation adds symbolic manipulations and requires version 2009a (7.8) or above 
to run. 



6 Context 

Some contextual works that are directly linked in topic with this paper are Hankin [2010], Ng et al. [2011], 
Hunter [2004]. Ng et al. [2011] is collects a number of Dirichlet related distributions; its Section 8.1 surveys 
results for the generalized Dirichlet Distribution (c.f. Dickey et al. [1987], Tian et al. [2003]), to which Theorem 
4.41 of this paper applies. 

6.1. The Hyper-Dirichlet Distribution of Hankin (2010). Counts are allocated into n categories, allowing 
unionic categories and the exclusion of certain categories. Each category possesses a characterizing attribute 
called "probability", collectively denoted by the simplicial vector p, to which the counts are proportional, 
up to an error expected to equal zero. The allocation process looks random by each individual count. But 
these counts collectively stabilizes to a statistical distribution whose conjugate distribution is motivated as the 
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"hyper-Dirichlet" distribution by Hankin [2010] where its density is stipulated as Hankin [2010, Eqn (5)] 

/(p)-(rW n (e*)™ 

\i=l / G£p{K) \i€G 1 

where K is the set of positive integers not exceeding fc, p(K) is its power set, and J 7 is a function that maps 
p{K) to the real numbers. Hankin [2010] also prototypes a software package using a 1-D data structure based on 
the power set. 6 The Hyper-Dirichlet distribution (and its conjugate) is the model that pair with our technique. 
On the website accompanying this paper, we demonstrate the effectiveness of our Weaver algorithms pair on the 
massive volleyball data (c.f. Hankin [2010] Section 3.3 Team sports) 7 The greedy weaver handles this with only 
87 iterations to achieve a precision of SSE=10~ 20 , significantly faster then all methods known to us that can 
correctly output a solution. For example, the Nelder-Mcad Simplex method would take thousands of iterations 
to get close to this precision mainly because the region around MLE is too flattened (c.f. Hankin [2010] Section 
3.3 for R code based on the hyperdirichlet package). 



6.2. The Nelder-Mead Simplex Algorithm and The MM Algorithm of Hunter (2004). The Nelder- 
Mead Simplex optimization algorithm is used in software supplement to Hankin [2010] to find the mode of the 
hyper-Dirichlet distribution. The Nelder-Mead Simplex method (c.f. Nelder and Mead [1965], Lagarias et al. 
[1998]) senses the target function by inspecting its value at the finite collection of the vertices of a simplex. It 
adapts itself, per iteration, by reflection, shrinking, and expansion, to ensure that the lowest vertex is climbing. 
Usually brute-force search methods, though effective, are not efficient. But the Simplex method turns out to 
be very efficient and it is the default optimization method employed by the R function stats :: optim() in the 
basic package stats for general purpose optimization. 

Hunter [2004] develops an MM (Minorization-Maximization) iteration for optimizing the Bradley- Terry 
model (c.f. Bradley and Terry [1952], David [1988]). We can understand the iteration as a Cauchy-sequence. 
Write the log-density with a Lagrange multiplier term —A (x\ H + x n — 1) = 0: 

n q 

In / (x\a, b, A) = - In c + ^ a; lnx t + bj hi (S]x) — A (xi H \-x n - 1) (6.1) 

i=l j=l 

Write the likelihood equations with the Lagrange multiplier: 

0=^+£^-A,V* = L. 

Xi * — ' " T. 
1 = X\ 

simplifying to 



^— ' 5\x 

3=1 3 



Al„xi-A(^) ( 62 ) 

1 — 1 T T 

where any division between two vectors is element- wise. This gives rise to the following iteration of Hunter 
(2004): 

xit+1) <~ n h i \ sub j ect t0 1 = lT ^i x(t+1) ( 6 - 3 ) 

Ai„ x i - a {^-m) 

Another facade of the same iteration is the transfer of optimization (c.f. [Lange et al., 2000]) of sum of a bunch 
of curved logarithms to sum of a bunch of linear slopes. In this particular case, the slopes are partial derivatives. 
But the partiality would not affect the overall optimality if one selectively linearize only those negative powers 
due to [Hunter, 2004, Eqn (9)]. Comparing to the Simplex Method, the MM iteration of Hunter (2004) would 
trade Simplex Method's vast admissible range of inputs for simplicity in implementation and improved speed. 



6.3. Hessian matrix around the mode of a Hyper-Dirichlet density surface. The vintage Newton- 
Raphson method can also be used but as is well known the matrix inversion required by it may halt the iterations 
prematurely. We omit the details of the algorithmic mathematics here except that we have to mention the bonus 
of asymptotic variance estimate automatically embedded in Newton- Raphson. Separately, we derive the explicit 
form of the variance around the maximum as following. 

6 The problem with this data structure is that the data is usually sparse, leaving the powerset being most often unnecessarily 
exponential in time, space, and i/o. 

7 The full datasct was originally encapsulated in the R package hyperdirichlet 's volleyball data which contains an 
hyperdirichlet object called vb_synthetic; it has a 500x9 weaving grid. 
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Lemma 6.1. Following [Ng et at, 2011, Eqn (8.18)]'s representation of [Rankin, 2010, Eqn (5)]: 



f( X \a,b,A)^Y[ X rE( 5 l x f 

i=l j=X 



the Hessian matrix of a hyper- Dirichlet log-density is given by 

H (x- n ) = —diag 
where for j = 1, . . . , q and for 1 ^ i, k ^ n — 1 



/ aa a n -i \ 

\ X 1 X n-lJ 



On 1 



*n-l ti) 



■011 (j) 



(n-l)x(n-l) 



-01,(n-l) U) 



4>ik (j) 

Proof. Mainly derivations. The log-density is 



^{n-l),l U) ■ ■ ■ ^(n-l).(n-l) CO 

6j (£y - <M {hi ~ hj) 



{51*? 



In / (x|a, 6, A) = — c + a* lnx^ + In {SJq 

i=i j=i 



The generic first partial derivative is 



<91n/(x) _ a„ i \ - bj (A^- - A raj ) 

6]x 



E 



Then we can write the score vector as the following (this is a bonus). 
V„_i In / (x|a, 6, A) 



••■ 1 -1 



\ 



- + A ' 



x \ Atx 



(n-l)xn 



The generic second partial derivatives are 

d 2 ln/(x) 
<9x? 



<9 2 In / (x) Or, 



dxiX k 

To simplify notation, denote 



j=l 

y-v (A„- - A nj ) {A k j - A, 



, ^ bj- (Ajj - A»j) (A fcj - A nj ) 



as the ifc-th element of the symmetric matrix 

#n-l = 



0ii 

^(n-l),l 



^l,(n-l) 
0(n-l),(n-l) 



Then we can write the Hessian as 

(x_„) = —diag 



Oil Cbn-l \ a n -. j 

•I'm 
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6.4. Algebraically solve the TSA polynomial system. Going back to the philosophical claim of this paper, 
i.e., the eigenreconstruction of p from p, we note that underpinning this philosophy is the system of polynomial 
equations that abide by the algebraic rules 2.1. The references for employing computational commutative algebra 
to attack the system include but not limited to: Pistone et al. [2000], Cox et al. [2006], Sturmfels [2002]. 

7 Software supplements 

Online and Desktop implementations in Microsoft Excel 2007+ and MATLAB 2009a(version 7.8)+, are available 
at http : / /hku . hk/ j dong/eigenstruct2013a . html. 
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